Generating examples of paths summarizing RDF datasets
نویسندگان
چکیده
As datasets become too large to be comprehended directly, a need for data summarization arises. A data summary can present typical patterns commonly found in a dataset, from which high-level understanding of the data can be obtained. Nonetheless, such abstract understanding can be improved by providing concrete examples of the summary patterns. If possible, the chosen examples should be diverse and representative of the patterns they instantiate. In this paper, we present three methods for generating examples of patterns discovered in RDF datasets. The patterns we consider are the most frequent path graphs that consist of classes of instances or data types of literals connected by RDF properties. We propose an RDF/S vocabulary for describing these path graphs and their instances. We present three methods for generating path examples, namely random, distinct, and representative selection, that are based on randomization, diversification, and clustering.
منابع مشابه
Aether - Generating and Viewing Extended VoID Statistical Descriptions of RDF Datasets
This paper presents the Aether web application for generating, viewing and comparing extended VoID statistical descriptions of RDF datasets. The tool is useful for example in getting to know a newly encountered dataset, in comparing datasets between versions and in detecting outliers and errors. Examples are given on how the tool has been used to shed light on multiple important datasets.
متن کاملGenerating RDF for Application Testing
Application testing is a critical component of application development. Testing of Semantic Web applications requires large RDF datasets, conforming to an expected form or schema, and preferably, to an expected data distribution. Finding such datasets often proves impossible, while generating input datasets is often cumbersome. The GRR (Generating Random RDF) system is a convenient, yet powerfu...
متن کاملOn the outer independent 2-rainbow domination number of Cartesian products of paths and cycles
Let G be a graph. A 2-rainbow dominating function (or 2-RDF) of G is a function f from V(G) to the set of all subsets of the set {1,2} such that for a vertex v ∈ V (G) with f(v) = ∅, thecondition $bigcup_{uin N_{G}(v)}f(u)={1,2}$ is fulfilled, wher NG(v) is the open neighborhoodof v. The weight of 2-RDF f of G is the value$omega (f):=sum _{vin V(G)}|f(v)|$. The 2-rainbowd...
متن کاملTop-K Shortest Paths in Large Typed RDF Datasets Challenge
Perhaps the most widely appreciated linked data principle is the one that instructs linked data providers to provide useful information using the standards (i.e., RDF and SPARQL). Such information corresponds to patterns expressed as SPARQL queries that are matched against the RDF graph. Until recently, it was not possible to create a pattern without specifying the exact path that would match a...
متن کاملRDF-3X: a RISC-style engine for RDF
RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper p...
متن کامل